gemma : more consistent attention scaling for v2 and v3 #13951

ggerganov · 2025-06-01T08:08:54Z

outdated

I suspect the reference configs for Gemma 27B v2 and v3 are borked:

It does not make sense to normalize the Q tensor with hidden_size / num_heads. It should be normalized with head_size, like all other models.

This change improves PPL and fixes the catastrophic generation at large contexts (see #12433 (comment))

* gemma : fix attn scale for 27B * cont : apply scale before attn * cont : consistent attention scaling

gemma : fix attn scale for 27B

36469ad

ggerganov mentioned this pull request Jun 1, 2025

Eval bug: Gemma3 <unused32> spam #12433

Closed

cont : apply scale before attn

67c4346

ggerganov marked this pull request as draft June 1, 2025 19:13

cont : consistent attention scaling

fbc6df0

ggerganov changed the title ~~gemma : fix attn scale for 27B~~ gemma : more consistent attention scaling for v2 and v3 Jun 2, 2025

ggerganov marked this pull request as ready for review June 2, 2025 15:58

ggerganov merged commit 5582c49 into master Jun 2, 2025
46 checks passed

ggerganov deleted the gg/gemma-fix-attn-scale branch June 2, 2025 17:54

furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025

gemma : more consistent attention scaling for v2 and v3 (ggml-org#13951)

ccb2d87

* gemma : fix attn scale for 27B * cont : apply scale before attn * cont : consistent attention scaling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gemma : more consistent attention scaling for v2 and v3 #13951

gemma : more consistent attention scaling for v2 and v3 #13951

Uh oh!

ggerganov commented Jun 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gemma : more consistent attention scaling for v2 and v3 #13951

gemma : more consistent attention scaling for v2 and v3 #13951

Uh oh!

Conversation

ggerganov commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Jun 1, 2025 •

edited

Loading